PADRE | A Parallel Document Retrieval Engine

نویسنده

  • David Hawking
چکیده

Developments in text retrieval on the AP1000 since last year's PCW are reported. The software, now called PADRE, has been entered in the competition associated with the 1994 Text Retrieval Conference (TREC-3). PADRE is now capable of document relevance estimation and ranking, and supports data loading from and dumping to the Fujitsu Local Filesystem. A new load balancing operation has been devised and implemented and improved techniques for handling cell-program error conditions have been adopted. Experiments have been successfully carried out on document collections exceeding 1.5 million documents and 5 gigabytes of data. Performance results are presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Design And Implementation Of A Parallel Document Retrieval Engine

1 1 Date paper completed. Publication as a technical report was delayed for various reasons. SUMMARY Document retrieval as traditionally formulated is an inherently parallel task because the document collection can be divided into N sub-collections each of which may be searched independently. Document retrieval software can potentially exploit the power and capacity of a large-scale parallel ma...

متن کامل

A Parallel Document Retrieval Server For The World

An architecture is proposed which enables the Parallel Document Retrieval Engine (PADRE), running on a single-user Fujitsu AP1000 multicom-puter, to operate as an information server on the World Wide Web. The advantages and disadvantages of a distributed memory parallel machine for this purpose are discussed and the likely applicability to diierent types of parallel machine is considered. Ideas...

متن کامل

A Parallel Document Retrieval Server For The World Wide Web

An architecture is proposed which enables the Parallel Document Retrieval Engine (PADRE), running on a single-user Fujitsu AP1000 multicomputer, to operate as an information server on the World Wide Web. The advantages and disadvantages of a distributed memory parallel machine for this purpose are discussed and the likely applicability to di erent types of parallel machine is considered. Ideas ...

متن کامل

PADRE for COWs

Earlier work with the Parallel Document Retrieval Engine was oriented toward parallel machines such as the AP1000, characterised by many nodes, few disks, small memory per node (by current standards), single-user operation and high communication performance, relative to node computational power. Present generation parallel machines are much more like clusters of workstations (COWs). There are t...

متن کامل

Searching For Meaning With The Help Of A PADRE

Full-text scanning o ers signi cant advantages over other methods of document retrieval but is normally too slow for use on large collections. The Fujitsu AP1000 parallel distributed-memory machine has been used to reduce the time penalty for full-text scanning to acceptable interactive levels. The query language for the retrieval software (called PADRE) is described herein and di erences betwe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994